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Abstract A researcher collaborating with many groups will normally have 
more papers (and thus higher citations and /i-index) than a researcher spend¬ 
ing all his/her time working alone or in a small group. While analyzing an 
author’s research merit, it is therefore not enough to consider only the collec¬ 
tive impact of the published papers, it is also necessary to quantify his/her 
share in the impact. For this quantification, here I propose the /-index which 
is defined as an author’s percentage share in the total citations that his/her 
papers have attracted. It is argued that this /-index does not directly depend 
on the most of the subjective issues like an author’s influence, affiliation, se¬ 
niority or career break. A simple application of the Central Limit Theorem 
shows that, the scheme of equidistribution of credit among the coauthors of a 
paper will give us the most probable value of the I -index (with an associated 
small standard deviation which decreases with increasing h- index). I show that 
the total citations (iV c ), the /i-index and the I -index are three independent pa¬ 
rameters (within their bounds), and together they give a comprehensive idea 
of an author’s overall research performance. 

Keywords Coauthors’ contributions • Independent parameters • Central 
Limit Theorem 


1 Introduction 

At this age of increasing specialization, it has become almost impossible to 
go through all the research works of an author and judge their merits. This 
inability necessitates an objective analysis of an author’s research output so 
that a wider population can comprehend his/her research merit. This objective 
analysis is also very helpful in comparing research outputs of different authors, 
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and has become important tool for the employers, policy makers and grand 
commissions. 

How do we objectively and comprehensibly analyze an author’s research 
merit? Clearly for such an analysis many factors should be considered -the 
quality and quantity of the research output, coauthors’ contributions to a 
researcher’s work, his/her ability to do independent work, a researcher’s effi¬ 
ciency in doing collaborative work, his/her ability in working in different fields, 
etc. It is possible to carefully define different parameters/metrics to quantify 
each of the above aspects of an author’s research performance. To reflect on 
a particular aspect, if I may use physics terms, one is required to extract the 
“coarse-grained” information out of a huge amount of “microscopic” details 
associated with an author’s publications and their impacts. 

At this point it is important to realize that a single parameter or metric 
can not give a full view of an author’s scholarship or research merit. As men¬ 
tioned above, different parameters can be defined to judge different aspects of 
an author’s scholarly output. For an efficient and objective description of an 
individual’s overall research performance, it is therefore crucial to recognize 
the most important aspects of research output and separately quantify them 
by carefully defined parameters. These parameters are expected to be inde¬ 
pendent to avoid redundancy and it is also expected that they will have some 
simple physical meaning such that they can be comprehended by the wider 
population. 

In this work we identify three most important aspects of an author’s re¬ 
search output - (a) quantity, (b) quality and (c) author’s own contribution in 
his/her published works. In other way, these three aspects are the collective 
impact of the published papers, author’s productivity and author’s share in 
the total impact of his/her works. Clearly we need at least three independent 
parameters/metrics to reliably quantify these three different aspects. What are 
the three independent parameters which best serve this cause? I will argue in 
this paper that the total number of citations (iV c ). the /-index and the newly 
defined /-index -these three parameters do the job satisfactorily. 

Let me now briefly discuss why the division of credit among the coauthors 
is so important, and N c and the /-index are not enough to analyze an au¬ 
thor’s scholarly activity. It is not uncommon for the senior and established 
researchers to collaborate with many groups and publish a large numbers of 
papers per year. These researchers will normally have higher citations (N c ) 
and the /-index than those who are spending all their time working alone or 
in a small group. It will be greatly unfair for these lonely or small group work¬ 
ers if an author’s research performance is analyzed only by the parameters N c 
and the /-index. It is therefore necessary to quantify a researcher’s own role in 
his/her success or in other words, how much the researcher could have achieved 
if he/she had worked independently. Here I propose the /-index (it can be in¬ 
terpreted as the Independence-Index) to solve this problem. We will see in the 
next section that, this index has a simple meaning which will appeal to the 
wider population. It is defined in such a way that its value will not directly de¬ 
pend on the most of the subjective issues like an author’s popularity/influence, 
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affiliation, seniority and career break/low activity (due to some severe medical 
condition, family tragedy or importantly a female researcher’s motherhood). 
It is also argued in this paper that a simple scheme of equidistribution of credit 
among the coauthors of a paper will not normally result in a significant error 
in calculating the /-index. We will see that N c , the /i-index and the /-index are 
three independent parameters (within their bounds), and together can give a 
comprehensive idea about a researcher’s overall performance (see Sec. ED>. 

There is an additional advantage in considering the /-index while analyz¬ 
ing an individual’s research output. The parameters like N c and h -index can 
be unethically inflated in different ways. For example, a number of researchers 
working in several independent groups can decide that when a group publishes 
a paper, it will give authorships to the members from other groups even when 
they do not contribute. It is often complained that junior authors are some¬ 
times compelled to give authorships to senior non-contributing researchers 
for sub-academic reasons. This unethical practice will be discouraged if the 
/-index is considered while analyzing an individual’s research performance. 

It is often stated that, even though it is very important to quantify an 
author’s own share in the total credit of his/her published papers, but doing 
so may demoralize researchers to do true collaborations which are imperative 
for the progress and betterment of science. This crucial issue can be mostly 
resolved if we quantify three different research aspects by three separate inde¬ 
pendent parameters. In this three-parameter framework of research analysis, 
researchers will be encouraged to do effective collaborations to improve their 
N c and h- index. At the same time they will be probably restrained from re¬ 
sorting to the unethical practices (mentioned above) if the /-index is also 
considered along with the other two indices. In this framework of analysis, 
authors’ ranking can still be done according to their h values (supplemented 
by N c ); the /-index can help resolve the ranking issues when multiple authors 
have close values of Ii-index (and N c ). In fact, researchers can be ranked in 
different ways depending on what importance is given to the /-index (for more 
discussions, see Sec. EED- 

In this work, not much importance is given to describe all three aspects of 
research performance by a single parameter or metric. Here I may emphasize 
that any attempt to do so would be gross due to serious loss of informations. 
The obscurity or ambiguity resulting from the loss of informations may even¬ 
tually lead in the error of judgement; as a consequence, a group of scientists 
may get undue advantage while the deserving candidates may be penalized. 
For example, though the h-index [Tj somewhat successfully quantifies first two 
aspects of an author’s research output, the ?i-index [2], which additionally 
attempts to consider coauthors’ role, is not that successful. Besides loosing 
simple meaning and calculation friendliness, the ?i-index is known to be un¬ 
fair towards junior researchers and extra biased towards senior (having high 
Ii-index) researchers. Three carefully-defined independent parameters would 
provide us much better view (higher resolution) of a researcher’s scholarly ac¬ 
tivity than any single parameter can possibly do. With these facts in mind, 
one may also like to know what parameter we should use if for some practical 
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reasons it is needed to rank authors by a single parameter. For this purpose, 
in Sec. ED I define a normalized h -index (written as /-index) which combines 
the effects/impacts of both /-index and /-index in a rational way. This /- 
index is interpreted as the possible /-index of an author if he/she had worked 
alone. Subsequently I also propose /-p-index which additionally takes care of 
the seniority issue. 


2 /-index: definition and characteristics 

Before I define and discuss the /-index, I will first briefly deliberate on two 
main assumptions considered in this work: 

(1) The impact of a paper is solely determined by the number of citations it 
received. This number of citations is the total credit to be distributed among 
the coauthors of the paper. 

(2) For a multi-author paper, each author is indispensable and effectively con¬ 
tributes equally if not mentioned otherwise. 

While the first assumption is somewhat easy to comprehend, the second as¬ 
sumption needs some discussions. I will argue and try to establish in this work 
that, even though the assumption of equidistribution of credit may not be sat¬ 
isfactory when applied to a single publication, it becomes quite a reasonable 
assumption when applied to all the publications by an individual to determine 
his/her overall share in the total citations received by those publications. 

Controversies and debates over credit distribution are not rare. Despite 
the fact that it is crucial to distribute the credit among the coauthors, the 
demarcation of contributions is a hopelessly difficult job. Sometimes even for 
the coauthors it appears impossible to decide who contributed what and what 
weight it carries. Sometimes an author’s contribution may be small but in¬ 
dispensable, without which the paper will not be complete and published. 
Sometimes though a senior author’s direct contribution to a paper is less but 
we have to remember that he/she generally spends lot of time writing projects 
and bringing fundings, without which there w r ill be no research and no paper. 
Any ‘logical’ distribution of credit among the coauthors of a paper is highly 
subjective and hence debatable. Different experts evaluating a multi-author 
paper would give different credits to a particular author depending on how the 
evaluation was done. This discussion clearly shows that, due to the inherent 
subjective nature of the analysis, we can not have a satisfactory determinis¬ 
tic model for quantifying an individual’s share in the total credit of his/her 
published papers (the third aspect of research output, as mentioned before). 
If we define an index/metric to quantify this aspect of research output, and 
a large number of experts independently estimate the value of the index for 
an individual, then they will get different values for the index. Due to this 
inevitable randomness (or uncertainty) in the estimated value of the index, we 
need to develop a realistic statistical model to predict the most probable (or 



Analyzing research performance: proposition of a new complementary index 


5 


expectation) value of the index. Here I define the /-index to quantify an indi¬ 
vidual’s share in the total credit of his/her works. I then discuss two relevant 
statistical models (two different statistical approaches), and show that, within 
the domain of their validity, the scheme of equidistribution of credit gives the 
most probable value of the /-index. Frequently in this paper the most probable 
value of the index is simply referred to as the /-index of an individual. It may 
be also mentioned here that the statistical arguments presented in this work 
is not generally applicable for a junior author with only a few papers. 

Definition: The /-index is an author’s percentage share in the total citations 
received by his/her published papers. If ct is the number of citations received 
by the *-th paper and Zi is the author’s expected share of credit for the paper, 
then his/her /-index is given by: 


i = x 100%, 


(i) 


where N p is the total number of papers published by the author. Now if n, 
is the number of authors contributed for the i-th paper, then, assuming the 
equidistribution of credit among the coauthors, we have z t = Ci/ui- Conse¬ 
quently, the author’s /-index would be, 


I = 



x 100%, 


( 2 ) 


where N c = c i- 

In the following I will present two different statistical arguments to demon¬ 
strate the effectiveness of the equidistribution of credit scheme in calculating 
the /-index. After that I will discuss some of the main features or the charac¬ 
teristics of the index. 

Argument (1): In short, here I will argue that the value given by Eq. [2] is 
the most probable value or the expectation value of the /-index defined in Eq. 
[TJ the statistical error in calculating the /-index using Eq. [2] is not normally 
significantly large. 

Consider that a multi-author paper has n coauthors and received c ci¬ 
tations. Let is the j-th author’s expected share of credit for the paper; 
it is possible to express this quantity in the following form: z 3 = c/n + e 3 , 
where e 3 is the author’s deviation of share from the average share of coau¬ 
thors (c/n). Since the total credit to be distributed among the n authors is 
c, we must have Ej=i ^ = c - This implies, Ej=i ^ = 0- Now using this 
relation and the fact that z 3 >0, we can get the strict mathematical bounds 
for e 3 : — — < e 3 < ■ In practice we expect the deviation \e 3 \ to be small 

and within some fraction of the average share (i.e., \e 3 \ < c/n). The relation 
Y^j=i e3 = 0 confirms that, the quantity e 3 would be positive for some authors 
and negative for others (e 3 can be zero, of course). Now which authors deserve 
to get positives values of e 3 and which authors should get negative values? 
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While this can be hard to decide, it will not be unreasonable to assume here 
that, for an individual author with many published papers, his/her e 3 will be 
positive for some of his/her papers and negative for others. In other words, 
sometimes an individual researcher’s contribution to a multi-author paper can 
be more than the coauthors’ average contribution to the paper, while in some 
other occasions his/her contribution to a multi-author paper would be less 
than the average contribution. In the following I will use this statistical prop¬ 
erty of e 3 to calculate an individual’s expected share in the total citations 
received by his/her papers (the superscript index j will be dropped since we 
will focus on a particular author). 

Let a researcher’s expected share of credit for his/her i-th paper is Zi = 
Ci/rii + 6i, where is a small number (|e;| < Ci/rii). While Ci/rii > 0 for all 
papers, statistically the number e, would take positive values for some papers 
and negative values for others. When nt = 1, we have = 0, since for a 
single-author paper its sole author gets all the credit (zj = Ci). Now when we 
calculate the researcher’s total share in the collective credit of his/her papers 
by summing z t over all the published papers, we get Cshare = Si=i Ci/rii + E r , 
with E r = Yli=i e i- Since ej is a small quantity (|e*| < Ci/rii) and statistically 
it takes both positive and negative values, we expect that E r will generally be 
a very small number when N p is large (i.e. \E r \ -C Ci/rii). Therefore, if we 

ignore E r and just take Cshare ~ y/)i=u d/rii, then the resultant error would 
be normally less than what one might expect to get from this simple scheme 
of equidistribution of credit (in somewhat different context an argument sim¬ 
ilar in spirit can be found in Refs. [»• While calculating the /-index, this 
resultant error ( E r ) will then be further weakened due to the presence of the 
large denominator factor ( N c ) in the definition of the index (cf. Eq. |T|) . We 
note that the possible statistical error in calculating the /-index using Eq. [2] 
is A = jf- x 100%. This error is expected to be negligible when N c becomes 
large. 

Let us now try to get a rough estimation of |Z\| for an individual. First 
consider that the author has l number of significant papers so that the total 
number of citations for these l papers is much larger than the total number 
of citations for the rest of the papers (i.e., d ^ d when papers 

are arranged in the descending order of citation count). The value of l can be 
assumed to be the /.-index of the author. Furthermore consider that c and n are 
respectively the average number of citations and the average number of authors 
for those l significant papers. As we discussed before, in practice we expect 
|ej| to be some percentage of the corresponding average, i.e., |ej| ~ ^ x //- 
where Xi may take any value between, say, 0 and 20. This allows us to write 
E r = J2i^i e i ~ TMW Hi=i s i x i- Here s l carries only the sign of e*; if e* is 
positive (negative), then s, = +1 (s* = —1). Now if we take x to be the average 
value of Xi’s for those l significant papers, then E r ~ Xw=i s *- With 

N c = d ~ we get the following, xi(^=) J2i=i s h 

or, A ~ yN =1 Si . We note that, if an individual’s estimated contribution to 
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a multi-author paper is more (less) than the average contribution of coauthors, 
then Si = +1 (sj = —1). If all s/s are +1, then Y^!i—i s i = ^ On the other 
extreme, if all s/s are -1, then ]T^=i s t = —l. In principle, depending on the 
details of the author’s contributions made to the l significant papers, s * 
can take any of the following possible values: {—l, — l + 2, — l + 4, • • •, l}. Since 
the value of A can be different depending on the value of ]T)| =1 Sj, we will 
now calculate an expected value of A for an individual author. Noticing that a 
simple average over all possible values of A is 0, we will here consider the root 
mean square value of A as its expected value. Once we know this root mean 
square value (denoted as |Z\|), we can say that, an individual’s percentage 
share of credit for his/her works would be normally within (/ ± |Zi|)% where 
the value of / is given by Eq. [2j Now to calculate |Zi|, we first note that s/s 
are independent variables. This is because an author’s amount of contribution 
to one paper does not presumably depend on his/her amount of contribution 
to another one. This independence of variables allows us to use some simple 
statistical results in estimating \A\. Now, these l independent variables can 
take values in 2 l possible ways. For example, all the variables can be 1. This 
can happen in only one way /Co) and in this case //.■. , Sj = l. Similarly, one 
variable can be -1 and the rest can be 1. This can happen in l C\ ways and 
in this case ^!;=i s, = l — 2. In general k variables can be -1 and the rest 
(l — k ) variables can be 1; this can happen in l Ck ways and here y/( l s; = 

l — 2k. This counting helps us write the desired quantity in the following way: 
_ / 0/2 

|zi| ~ (j=) (5r)Ci=o l Ci(l — 2i) 2 ) . Some simple calculation shows that, 

Xo=o l Ci(l — 2i) 2 ^ = Vi. Therefore, we get |Zi| ~ =^. Here we see 

that the value of |Z\| gets smaller with increasing l (and n). While a typical 
value of |zi| is expected to be less than 1, a typical value of I is about 40. 
So here we conclude that the equidistribution of credit scheme gives us a 
reasonably good value of the /-index without much statistical error. 

Argument (2): It is possible to give a somewhat better mathematical argu¬ 
ment, based on the Central Limit Theorem [5] (CLT), to show that the value 
obtained from Eq. [2] is the most probable value of the I -index (with an associ¬ 
ated small standard deviation which decreases with the increasing number of 
significant papers). A very careful analysis of the situation is needed here. As 
we discussed earlier, due to the inherent subjective nature of the analysis, it is 
hardly possible to decide who gets how much credit for a multi-author paper. 
If different experts independently evaluate the distribution of credit among 
the coauthors of a paper, then a particular author will get different values of 
credit from the different experts depending on how the evaluation was done. 
So the /-index for a researcher, defined in Eq. [TJ will have different values 
when calculated by different experts. Which value shall we take? It would be 
recommended to take an average of these values. So what is the average or 
expectation value of the /-index if a large number of experts independently 
calculate it? Using the Central Limit Theorem we will now show that, within 
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some reasonable assumptions, the average value of the I -index is what one 
gets by the scheme of equidistribution of credit (cf. Eq. [2]). We are also in¬ 
terested in knowing the standard deviation about the average value, since a 
small deviation will allow us to confidently say that the average value is what 
an individual’s share of credit is without much uncertainty. 

When a large number of experts independently evaluate the sharing of 
credit for a multi-author paper, the values of credit obtained by a particular 
author will follow some distribution. That is to say, an author will get a certain 
credit with some probability. Let for the i-th paper its j- th author gets y\ 
credit with the (marginal) probability density //,;(?/,■)• In the joint probability 
distribution of credits (for a particular paper *), the variables yj’s are not 
totally independent; they obey a singular constraint: yj = d (with 0 < 

yl < d). So we see that the (random) variables y^s for different j’s are not 
independent, even though yj’s are totally independent variables for different i’s 
(for an individual author j). This makes it easier to apply statistical theory to 
determine the probability distribution for the /-function defined for a specific 
author: 

I{Y ) = Vi x 100. (3) 

Note that the author index j is dropped from the credit variables j/’s as we are 
focussing on a particular author. The symbol Y denotes the sum of all random 
variables (yi s). It may be noted that the /-function, defined for an individual, 
does not give a single value since each variable yi follows some distribution. The 
/-function gives a value with some probability; we are interested in knowing 
the average value of the /-function and the standard deviation associated with 
it. 

Before we go further, let us briefly discuss what the CLT tells us. Let X\, 
X 2 , X n are n number of independent random variables with arbitrary 
distributions but each has a well-defined mean value (E\Xi\ = yi) and a well- 
defined variance (var(Xi) = erf). Now consider the function: Y = )C" =1 X,. 
The CLT assures us that, in the limit of large n, values of Y will follow a 
normal or Gaussian distribution with a mean given by E[Y\ = y ~)”_ 1 fij and 
a variance given by var(Y) = ]C" = 1 < rf. This result from the CLT does not 
depend on the details of distributions of X,’s, and is often valid even for a 
small n 0 . 

Since the variables yi's are essentially independent, in the limit of large 
number of papers (N p ), we can use the above statistical results to assure 
ourselves that the /-function will be a Gaussian in nature whose mean and 
variance can be given in terms of the means and variances of the variables 
yi s. To make things more quantitative, we now need to consider the means 
and the variances of //,;’s. Since the variable yi can take any value between 
0 and Cj, and there are ni authors to share the total credit Cj, a reasonable 
assumption would be to take the mean value of the variable yi to be d/rii 
(note: if we sum this over all coauthors of i-th paper, we get back the total 
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credit Cj). In fact, even if the mean value of yt is not strictly Ci/rii , we will still 
normally have the same results that follow. Argument for this will be given 
soon after I write down the mean and variance of the /-function. Since the 
range of the variable yi is finite, its variance will also be finite (for any regular 
distribution); let us for the time being consider of (< oo) be its variance. 
If we now use the CLT results for /(F), we get the following: in the limit 
of large N p , the values of /(F) will be distributed in a normal or Gaussian 
distribution with the mean EZ 1 c i/ n i (i-e. the I -index defined in Eq. [2]) 
and the variance E 2 = Y,Zi °f • It ma y be noted that here we have used 
following two general relations: E[aXi] = aE[Xi ] and var(aXi) = a 2 var{X j), 
where a is any constant. 

Now I will argue that even if the mean of yt is not strictly Ci/nt, we will still 
normally have the same mean for the /-function. The reasoning goes exactly 
like the Argument (1) given before. Statistically, for some variables correspond¬ 
ing mean can be more than Ci/ni (i.e., E[yi] > Ci/nt) and for others the mean 
can be less than that (i.e., E[yi] < Ci/ni). Therefore when we calculate the 
sum of the means of yi s, we expect that the result will not be much different 
than Ci/ni. Now whatever (small) difference it might have, that will be 

further weakened by the large denominator factor N c present in the definition 
of the /-function. So here we conclude that, in all normal cases, the mean value 
of the /-function is EZ i c i/ n i without much significant deviation. 

Now we will analyze whether the /-function has broad or narrow peak 
about its mean value. A narrow peak about the mean value will allow us to 
confidently say that, an author’s /-index is what one gets from Eq. [2] 


For the distribution of yi , the standard deviation cq is expected to depend 
on Ci (this is because, normally larger is the range of a variable, wider is the 
distribution; here the variable yt varies from 0 to cf). We assume cq to be some 
percentage of the mean value Ci/rii of the distribution, i.e., cq ~ Ci/ni x Xi /100 
(xi takes values between, say, 0 and 20). Let us now consider that an author 
has / number of significant papers so that the total number of citations for 
these l papers is much larger than the total number of citations for the rest 
of the papers (i.e., Ei =i c * ^ Ei=i+i c * when papers are arranged in the 
descending order of citation count). The value of l can be assumed to be 
the /i-index of the author. If c and n are respectively the average number of 
citations and the average number of authors for those l significant papers, 


then N c = EZi c * ~ lc and EZi a i 




eZa 


rii 100 


l-l 


100 2 


where x 


is the average value of Xi for those l significant papers. This implies that, 


E 2 = 


100 ' 

AT? 




loo; 

l 2 c 2 


or E 


We see that the value 


n 2 100 2 1 ^ WT 

of E gets smaller with an increase in the values of l (or h- index) and n. 
A typical value of the standard deviation E is expected to be less than 1 
whereas a typical value of the mean value of the /-function is about 40. So 
we conclude that, normally the /-function defined in Eq. [3] has a very sharp 
Gaussian distribution about its mean value given by the /-index (cf. Eq. [2J. 
This allows us to say that the most probable value of the I-index can be obtained 
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by a simple scheme of equidistribution of credit among the coauthors of a 
paper. Uncertainty (statistical standard deviation) associated with the value 
is normally very small (especially for authors with high h-index). 

In the following I will now discuss some of the main features/characteristics 
of the I- index. 

Characteristic (a): Unlike the h -index or N c (= "Yldffi Ci), the I -index is 
expected to be a very slowly varying function of time. The h -index is linear in 
time while N c is quadratic in time m- Similar to N c , C sha re = Ya= 1 c i/ n i is 
also expected to be quadratic in time since both N c and Cshare are essentially 
linear sum of c,’s (see argument given in Ref. my Now we assume that, N c = 
a\t + a 2 t 2 and Cshare = bit + b 2 t 2 , where t is the career span of a scientist (see 
Sec. IT II) . and a\, 02 , b\ and 62 are some constants (author dependent). This 
leads us to / as a following function of time, I = 100 x = 100 x , 

or I = 100 x i Jt is now easy to see why the /-index is expected to be 

a very slowly varying function of time. 

For a similar reason, the /-index will not be much affected by career break 
or low activity (due to some severe medical condition, family tragedy or im¬ 
portantly a female researcher’s motherhood). We note that a career break/low 
activity would affect both N c and Cshare in a similar way. So their ratio i.e. the 
/-index is expected to be mostly free of the effects caused by these important 
subjective issues. 

Characteristic (b): Normally an author’s affiliation, seniority or popularity 
affects the citations (cj’s) received by his/her papers. As a result both Cshare = 
c i/ n i and N c = c i would depend on those factors. Since both the 
quantities, C s hare and N c , are linear functions of c^’s, we expect that both of 
them will be influenced in a similar way by those factors. Now as the /-index 
is defined as the ratio between those two quantities, it is expected that those 
subjective issues will not help better one’s /-index. The essential functional 
difference between the /-index and N c or h-index is that, unlike the later two, 
the /-index is a relative quantity which effectively quantifies what fraction of 
the total credit an individual entitled to get for his/her papers. Being a relative 
quantity, we expect the /-index to be mostly independent of all the subjective 
issues mentioned. 

For the properties of the /-index stated above (cf. (a) and (b)), it will 
not be unfair to compare values of this index for the authors with different 
seniorities or affiliations/popularities. 

Characteristic (c) The /-index can only be improved if a researcher starts 
publishing single-author or a-few-author impactful papers. Here it may be 
noted that even if someone manages to improve his/her h-index and N c by 
doing large number of collaborations, the /-index may not increase in this 
way, and sometimes it may decrease! Unlike N c or the h-index, the /-index is 
not a monotonically increasing function of time. For example, its value may 
decrease if a paper with a large number of authors starts getting highly cited 
or a researcher starts publishing large number of highly collaborative works. 
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Characteristic (d): Unlike N c and h- index, the /-index is a bounded pa¬ 
rameter. We see from Eq. [2] that, if rii = 1 for all i, then / = 100%, and if m’s 
are very large, then / will be very small. For any author this index takes a 
value between 0 and 100. Theoretically, 0% < I < 100% for any fixed non-zero 
values of N c and h- index. 


3 The triplet: N c , /-index, /-index 

In this section we will see how N c , h- index and /-index are three indepen¬ 
dent parameters and together they can provide us a comprehensive idea of an 
author’s overall research merit. We will also see advantages of choosing them 
over other available parameters. 

First we note that, irrespective of the values of N c and /-index, the /-index 
can take any possible value between 0 and 100 depending on the number of 
coauthors of the published papers (as explained above, see Characteristic (d) 
of the /-index). Theoretically, 0% < / < 100% for any fixed non-zero values 
of N c and /-index. 

Now for a fixed non-zero value of the /-index, the minimum possible value 
of N c is Z 2 while the maximum value can be any large number depending on 
the number of citations received by the individual papers within the /-core. 
Theoretically, Z 2 < N c < oo for any fixed non-zero values of the /-index and 
/-index. 

For a fixed non-zero value of N c , the minimum possible value of the /-index 
is 1, while the maximum value is L\AN C J if the number of papers N p > \\JN C \ 
else the maximum value is N p . Theoretically, 1 < / < h max for any fixed 
non-zero values of N c and /-index. Here h max = when N p > \\/W c \ ■> 

otherwise h max = N p . It may be noted here that [x\ is the usual mathematical 
floor function. 

I will now give three elementary examples to illustrate that the three pa¬ 
rameters are independent and that each parameter gives an important infor¬ 
mation which is not contained in other two parameters. First let us consider 
two researchers each with 10 papers, and for both of them, their papers are 
cited followingly (when arranged in the descending order of citation count): 
first paper is cited 10 times, the second one is cited 9 times, and so on (i.e., the 
i-th paper is cited (11 — z) times). In this example, N c = 55 and / = 5 for both 
the authors. In addition, if we now consider that the first researcher wrote all 
his/her papers with one more author (total two authors per paper) and the 
second researcher wrote all his/her papers with two more authors (total three 
authors per paper), then I = 50% for the first researcher and / = 33.33% for 
the second researcher. This shows that, even when two researchers have the 
same N c and / values, they can have quite different / values depending on the 
number of coauthors. A smaller value of I signifies that the researcher do more 
collaborative work. In the second example, consider that the first researcher 
has 12 papers, each cited 8 times and coauthored by two while the second 
researcher has 10 papers, each cited 8 times and coauthored by two. In this 
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case, h = 8 and I = 50% for both the researchers but N c = 96 for the first 
researcher while N c = 80 for the second researcher. So here, the total scientific 
impact of the first researcher is more than that of the other researcher even 
though their h and / values are same. In the third example, consider that the 
first researcher has 10 papers, each cited 8 times and coauthored by two while 
the second researcher has 20 papers, each cited 4 times and coauthored by 
two. In this case, N c = 80 and I = 50% for both the researchers but h = 8 
for the first researcher and h = 4 for the other researcher. In this example, 
the first researcher has more significant papers than the other researcher, or 
in other words, the first researcher’s quality of research work is better than 
that of the second researcher even though their Nc and / values are same. 

From the above discussions it is clear that N c , h -index and the /-index can 
take values independently (within their bounds). These three parameters or 
metrics quantify three most important aspects of a researcher’s scholarly out¬ 
put -quantity, quality and a researcher’s own role in his/her overall success. 
Each of these independent parameters carries important new informations; 
if we miss one, the description of a researcher’s merit will be highly incom¬ 
plete. This shows why a single parameter, however smartly defined, would be 
insufficient and gross in describing a researcher’s scholarly output. 

Now I will discuss, instead of other possible parameters, why I choose 
N c , /-index and /-index as the preferred ones to quantify the three separate 
aspects of an author’s research output. 

The /-index is known to be the best single parameter which somewhat 
successfully quantifies the first two aspects of one’s research output, i.e., the 
collective impact and the productivity (or in other way, the quality and quan¬ 
tity). But most of the time this parameter highly under-estimates the total 
impact of an author’s research output. For example, two authors having same 
value of /-index can have widely different collective impact if one of the authors 
has some very highly cited papers within his/her /-core. This necessitate us to 
choose a separate parameter to represent the collective impact of an author’s 
research output; the total citations or N c is the natural choice for this purpose. 
The advantages of using these two parameters are that they have simple and 
easy-to-calculate definitions and can provide very efficient and comprehensive 
description of the first two aspects of one’s research output. 

The concern for accounting coauthors’ contributions is not new and has 
been considered in many previous works [2ll3ll4l[7l[8ll9l ll01fTTlll21[T31ll4l . Now 
I will argue why the /-index does a reasonably good job in quantifying the 
third aspect of one’s research output, i.e., an author’s own contribution in 
his/her published works. In the most of the related works I know, all three as¬ 
pects of one’s research output were tried to be quantified by a single unbound 
parameter or by a coauthor ranking algorithm. But as we have emphasized 
several times, any single parameter (or any ranking algorithm which assigns 
a score to each coauthor) will be unsatisfactory in describing an author’s re¬ 
search output due to serious loss of informations. Moreover, it is not clear from 
those works whether the consideration of coauthorship would discourage true 
collaborations (for notable exception, see Ref. [2]). No bibliometric indicator 
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should discourage scientists from doing honest collaboration which is impera¬ 
tive for the progress and betterment of science. The ranking algorithms have 
additional problems. Generally they are computationally extensive for large 
number of authors sharing even larger number of papers. In practice hundreds 
of authors can be connected to each other by a coauthorship network and 
they may share thousands of papers (sometimes it is not even practical to 
get a complete set of authors sharing papers among them). Since in principle 
the ranking algorithms should simultaneously rank all these authors (and also 
papers) by solving equation of large matrices (representing authors, papers 
and their inter connections), it looks very unlikely that these algorithms can 
practically resolve the coauthorship issue. Additionally, due to complex com¬ 
putation (normally involves iterative matrix manipulations [TUj), the ranking 
looses intuitive meaning (or comprehensiveness) for the wider population. In 
contrast to these works, in this paper we do not try to quantify all the aspects 
of one’s research output by a single parameter. The /-index proposed here is 
a complimentary metric, meant to quantify only one aspect of an individual’s 
research output. It has a simple intuitive meaning (cf. Eq. [1]), is easy to cal¬ 
culate and argued to provide a reasonably good measure even with a simple 
scheme of equidistribution of credit (see Argument (1) and Argument (2) given 
in Sec. 0. The /-index, being a bounded parameter (varies from 0 to 100), 
will be very helpful in judging authors according to their performance in the 
third aspect of research output. A high value of the /-index signifies that the 
author works more independently (see also discussions in Sec. Id.II and Sec.0. 

An important advantage of separately considering /-index besides N c and 
the h -index is that, it will discourage the unethical practice of giving/taking 
authorship to/by non-contribution authors. This will not probably though 
deter scientists from doing true collaborations, as otherwise their N c and the 
h-index will not improve (see also Sec. m - 


3.1 Ranking of authors 

It is always difficult to make a merit list for authors. But when it is needed, 
how do we do it? Here I will discuss some practical ways of ranking authors. 

First I will discuss how this can be done using the three independent param¬ 
eters deliberated in this paper. In fact using three independent parameters the 
ranking can be done in different ways depending on which aspect of research 
is considered to be more important (for, say, a particular job). Three inde¬ 
pendent parameters naturally gives more freedom to the employers to choose 
candidates of their requirements. For example, considering h -index is the most 
important parameter among the three parameters, first one can try to rank 
authors according to their h values. Surely there will be many authors with 
same (or close) h values. One of the reasons for the occurrence of degeneracy 
is that the h-index takes only discrete integer values. The authors with same 
or close h values can be ranked using the /-index. An author with better / 
value should rank higher. In the next step, if these two parameters does not 
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help to resolve the ranking issue among a group of researchers then their N c 
values can be used to see who is the better performer. If for a particular job 
employers are looking for a researcher who can work independently, then they 
probably can give more importance to the /-index. In this case, among the 
researchers with their ft values within a fixed range, the employers can choose 
the person who has highest I value. 

As I already emphasized, a single parameter/metric will not be sufficient 
and reliable in describing an author’s research merit. With this fact in mind, we 
now ask, which parameter shall we use if for some practical reasons it is needed 
to rank authors by a single parameter? For this purpose I will now define a 
normalized ft-index (written as ft-index) which combines the effects/impacts 
of both ft-index and /-index in a rational way. Subsequently I also propose 
ftp-index which additionally takes care of the seniority issue. 


h-index and h?-index: Here idea is to estimate how much an author would 
have achieved if he/she had worked alone. Roughly an author will have N a = 
N c * //100 citations for his/her works if he/she worked alone (see definition 
of /-index, Eq. EJ. It is shown in Ref. [T] that the total number of citation 
(N c ) is proportional to ft 2 (this is a general trend with the proportionality 
constant varies for different authors). Therefore, N a = g\h 2 * //100; where gi 
is the proportionality constant. Now if ft is the expected ft-index of the author if 
he/shc had worked alone, then N a should be proportional to ft 2 , i.e., N a = g 2 h 2 
with g 2 being another proportionality constant. Comparing two expressions of 
N a , we get the following relation: ft = (i/—) ft * y/l/ 10. It is not easy to 


find any simple relation between the two constants g\ and g 2 . Here I present 
a rough argument to show that, for a given individual, the values of these 
two constants would not be much different. According to the simplest possible 
model discussed in Ref. [I], gi = , where the researcher publishes p 

papers per year and each published paper gets c new citations per year in every 
subsequent year. Now if the researcher had worked alone, the value of p would 
have been smaller. Since an effective collaboration enhances quality of papers, 
we can expect that c would also get smaller if the researcher works alone. Due 
to the collective or cooperative effect of collaboration, the sum of impacts of 
independent individuals is expected to be smaller than the total impact of the 
works done in collaboration by those individuals. Going by this argument, we 
can say that the ratio c/p will not be much different depending on whether a 
researcher works alone or in collaborations. This implies that, the value of g 2 
is expected to be reasonably close to g\. This is in accordance with the fact 
that, irrespective of the collaboration details of researchers, the proportionality 
constant g\ takes values from a small range of numbers (between 3 and 5 [I]). 
Now since the square root of a positive number is always closer to 1 than the 
number itself (|1 — y/x\ < |1 — x\ with a; > 0), we expect that . — will be very 


close to 1 even though ^ is somewhat away from 1. Now taking 



1, we 
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get the following formula for the normalized value of the h-index, 


h = h * yfl 1 10. 


(4) 


We note that, if an author publishes only single-author papers, then his/her 
I = 100, and consequently his/her h = h. This is in accordance with what 
one expects for a researcher who always works alone. The experimentalists 
do more collaborative works than the theorists; so compared to a theorist, 
an experimentalist will normally have higher value of h and lower value of I. 
This trend can be seen in the next subsection on results (see Tabic [TJ. For a 
theorist and an experimentalist of presumably same calibre, their values of h- 
index should be very close even though their h and I values are quite different. 
Interestingly this is what we observe in our analysis of some established authors 
(see Sec. Id.21) . 

Since h depends on both h and /, to improve the value of h- index, a re¬ 
searcher needs to better both those parameters or at least better one parameter 
keeping another relatively fixed. Advantage of considering the h -index over the 
original h-index is that, it will discourage researchers to involve in unethical 
practice of giving/taking authorship without substantial contribution. If they 
do, their /-index will reduce and as a consequence their /i-index will also be 
badly affected. But probably this will not dissuade researchers to do true col¬ 
laboration, as otherwise their h-index will not improve much and as a result 
h-index will not get better. 

It should be noted that the h-index is not an independent parameter, it 
is a derived parameter/metric proposed here to help rank authors using a 
single parameter. This parameter does not take into consideration the issue 
of seniority or length of research career. This can be done by dividing h by 
the length of an author’s research career. If T is the time (in years) between 
the first publication (at least once cited) and the last published one, then we 
define, 


h T = h/T. 


(5) 


This parameter (h^) takes into consideration both the issues of coauthorship 
and the length of research career. Though this simple division by career length 
has some problems. It will be unfavorable for the authors who had taken career 
breaks. At the same time it will favor the authors whose careers have ended. 
This second problem can be somewhat circumvented by taking T as the time 
between the first publication and the time of data collection. Here it may be 
noted that, in mathematical sense, hr is not a derived parameter since T is 
an independent parameter. 

3.2 Some results 


I have estimated three independent parameters/metrics (N c , h and I) for 
some of the established researchers. List is prepared carefully to represent 
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Table 1 The values of the parameters/metrics N c , h , /, h and Ht are given for some 
established authors. Age of an author is given within bracket just after his/her name. Under 
each author’s name his/her specialization and major awards (if any) are given. Here, TP 
= Theoretical Physics, EP = Experimental Physics, HE = High Energy physics, CM = 
Condensed Matter physics, AMO = Atomic, Molecular and Optical physics, QI = Quantum 
Information science, FM = Field Medalist, NL = Nobel Laureate. 


Author 

N c 

h- index 

I- index (%) 

h = h* y/7/10 

II 

E. Witten (63) 

(TP-HE, FM) 

166563 

179 

74.35 

154.3 

3.9 

A. Sen (59) 

(TP-HE) 

25967 

85 

81.62 

76.8 

2.3 

C.W.J. Beenakker (55) 
(TP-CM) 

29983 

83 

50.12 

58.8 

1.8 

D.J. Gross (74) 

(TP-HE, NL) 

44292 

83 

45.64 

56.1 

1.1 

T.W. Hansch (73) 
(EP-AMO, NL) 

51719 

107 

23.97 

52.4 

1.1 

C.L. Kane (52) 

(TP-CM) 

29471 

55 

43.26 

36.2 

1.3 

A.E. Nelson (57) 
(TP-HE) 

17153 

52 

37.81 

32.0 

0.9 

C. Monroe (49) 
(EP-AMO-QI) 

24774 

60 

19.85 

26.7 

1.0 


researchers working in different fields and belonging to different age groups 
(there is 25 years of age gap between youngest and oldest researcher). The 
results can be found in Table [T| In thejast two columns of the table values of 
the other two parameter (h -index and /i^-index) are also given. As I discussed 
in Sec. 13. 11 ranking can be done in different ways depending on how we analyze 
the research output. In addition, since the listed researchers work in different 
(sub)fields, it may not be appropriate to compare their performance without 
considering the publication/citation trends in the (sub)fields (for a discussion, 
see 21). In any case, for the completeness of our analysis in this paper, they 
are ranked in the table according to their h values. We may here note that, 
generally those having high /i-index u have high hx- index. For two authors with 
close h value, one may have lower fix value than the other if he/she takes a 
career break for some reason. This is because, a career break acts more harsh 
on hr than h. We also see from the table that the experimentalists have lower 
value of /-index than the theorists. This is because experimentalists generally 
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do more collaborations than theorists (an experimental paper normally has 
more authors than a theory paper). For the same reason, generally the ex¬ 
perimentalists have higher h-index than the theorists of their age group. This 
discipline dependency of these two parameters is the reason we choose h-index 
to decide the ranking in the table (h -index combines the effects/impacts of 
both h -index and /-index in a rational way). It is here interesting to note 
that, for the two Noble Laureates (D.J. Gross, a theorist and T.W. Hansch, 
an experimentalist), the research output measured by h or hx is same or very 
close even though their h-index and /-index are quite different. 

The parameters in the table are extracted from the data collected manually 
in July, 2015 from Google Scholar Citation. In the calculation of parameters, 
not only the original research papers, other scholarly works like review arti¬ 
cles and books are also considered. Some practical issues may appear while 
estimating these parameters. For example: different chapters of a book can 
be written by different authors. In this case if the total citations of the book 
is available, then that citation number can be first divided by the number of 
chapters and next this credit per chapter can be divided among the coauthors 
of a chapter to determine how much credit one author should get. If the detail 
author information of a scholarly work is missing, then the /-index should be 
calculated simply ignoring that particular work. 


4 Conclusion 

In this paper I have tried to establish a rational and objective framework for 
analyzing scientists’ research outputs. Three most important aspects of some¬ 
one’s research performance have been identified -collective impact, productiv¬ 
ity and author’s own contribution in his/her published works. It is emphasized 
that we need three independent parameters/metrics to quantify those three 
separate aspects reliably. A single parameter will be insufficient and gross in 
describing an author’s research performance due to serious loss of informations. 
A practical advantage of using three independent parameters for analysis is 
that it will give employers more freedom to choose candidates according to 
their requirement. I have suggested following three parameters for the pur¬ 
pose: the total number of citations (N c ), the h-index and the newly defined 
/-index. The /-index is defined as an author’s claim for the percentage of total 
citations received by his/her papers. Besides its simple and comprehensible 
meaning, this index is very easy to calculate and argued to be almost indepen¬ 
dent of most of the subjective issues like affiliation, seniority or career break. 
It is also argued using the central limit theorem that, the most probable value 
of the /-index can be obtained by the simple scheme of equidistribution of 
credit among the coauthors of a paper. Uncertainty associated with the value 
is normally very small. 

It will be highly unfair for researchers working alone or in small groups if 
we consider only N c and h -index to judge their performance. The researchers 
sharing time with many collaborators will normally have large number of pa- 
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pers and consequently have higher N c and /i-index. So it is crucial to distribute 
credit among the coauthors and measure how much contribution one has in 
his/her scientific achievement. The new index (i.e., I- index) proposed in this 
paper tries to address this crucial issue. A larger value of the I -index signifies 
that the author works more independently (this is why the /-index can be 
considered as the Independence- index). A practical advantage of considering 
this I -index along with N c and the h-index is that, it will discourage scientists 
from engaging in the unethical practice of giving/taking authorships to/by 
non-contributing scientists. This will, though, probably not deter scientists 
from doing true collaborations, as otherwise their N c and the h-index will not 
improve. 

In this work we have also defined h-index, and subsequently h^-index, to 
rank authors if for some practical reasons it is needed to rank them using 
a single parameter. Unlike the h-index, the h-index takes into consideration 
the crucial issue of coauthors’ contributions, while h^-index additionally takes 
care of the seniority issue. 

Since low value of the I -index signifies a more collaborative nature of one’s 
work, we can define a Collaboration -index or C-index, as a complementary 
index to the I- index: C = 100 — I. Note that, like /, C also takes values 
between 0 and 100. A larger C value for a researcher indicates that his/her 
work is more collaborative in nature. In future study, the average C-index 
for the scientists working in a particular field or in a particular institute can 
be estimated; this will tell us in which field or institute scientists do more 
collaborative works than others. Similarly the average values of the C-index 
for different countries can be calculated to see in which country scientists do 
more collaborative work. 
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